高斯过程可以说是空间统计中最重要的模型类别。他们编码有关建模功能的先前信息,可用于精确或近似贝叶斯推断。在许多应用中,尤其是在物理科学和工程中,以及在诸如地统计和神经科学等领域,对对称性的不变性是人们可以考虑的先前信息的最基本形式之一。高斯工艺与这种对称性的协方差的不变性导致了对此类空间平稳性概念的最自然概括。在这项工作中,我们开发了建设性和实用的技术,用于在在对称的背景下产生的一大批非欧基人空间上构建固定的高斯工艺。我们的技术使(i)以实用的方式计算(i)计算在此类空间上定义的先验和后高斯过程中的协方差内核和(ii)。这项工作分为两部分,每个部分涉及不同的技术考虑:第一部分研究紧凑的空间,而第二部分研究的非紧密空间具有某些结构。我们的贡献使我们研究的非欧亚人高斯流程模型与标准高斯流程软件包中可用的良好计算技术兼容,从而使从业者可以访问它们。
translated by 谷歌翻译
贝叶斯优化是一种数据高效技术,可用于机器人中的控制参数调整,参数策略适应和结构设计。这些问题中的许多问题需要优化在非欧几里德域上定义的函数,如球体,旋转组或正向矩阵的空间。为此,必须在感兴趣的空间内之前或等效地定义内核的高斯进程。有效内核通常反映它们定义的空间的几何形状,但设计它们通常是非微不足道的。基于随机部分微分方程和Laplace-Beltrami运营商的频谱理论,最近在Riemannian Mat'En内核的工作,提供了朝向构建此类几何感知内核的承诺途径。在本文中,我们研究了在机器人中的兴趣流动上实施这些内核的技术,展示了它们在一组人工基准函数上的性能,并说明了各种机器人应用的几何感知贝叶斯优化,覆盖方向控制,可操纵性优化,和运动规划,同时显示其提高性能。
translated by 谷歌翻译
高斯工艺是能够以代表不确定性的方式学习未知功能的机器学习模型,从而促进了最佳决策系统的构建。由于渴望部署新颖的科学领域的高斯过程,一种迅速增长的研究线路集中于建设性地扩展这些模型来处理非欧几里德域,包括黎曼歧管,例如球形和托尔。我们提出了概括这一类的技术,以模拟黎曼歧管上的矢量字段,这在物理科学中的许多应用领域都很重要。为此,我们介绍了构建规范独立核的一般配方,它诱导高斯矢量字段,即矢量值高斯工艺与几何形状相干,从标量值riemannian内核。我们扩展了标准高斯过程培训方法,例如变分推理,以此设置。这使得旨在使用标准方法培训的Riemannian歧管上的矢量值高斯流程,并使它们可以访问机器学习从业者。
translated by 谷歌翻译
学习包括不同对象之间接触的动态系统的物理结构化表示是机器人技术中基于学习的方法的重要问题。黑盒神经网络可以学会大致表示不连续的动态,但是它们通常需要大量数据,并且在预测更长的时间范围时通常会遭受病理行为。在这项工作中,我们使用深层神经网络和微分方程之间的连接来设计一个深网架构家族,以表示对象之间的接触动态。我们表明,这些网络可以从传统上难以实现黑盒方法和最近启发的神经网络的设置中的嘈杂的观察结果中以数据效率的方式学习不连续的联系事件。我们的结果表明,一种理想化的触摸反馈形式(由生物系统严重依赖)是使这一学习问题可以解决的关键组成部分。加上通过网络体系结构引入的电感偏差,我们的技术可以从观测值中准确学习接触动力学。
translated by 谷歌翻译
多边缘最佳运输使人们能够比较多种概率措施,这些措施越来越多地发现在多任务学习问题中的应用。多边缘运输的一个实际限制是测量,样品和维度数量的计算可扩展性。在这项工作中,我们提出了一种基于随机一维投影的多边缘最佳运输范例,其(广义)距离我们术语切片的多边缘Wasserstein距离。为了构建该距离,我们介绍了一维多边缘Kantorovich问题的表征,并使用它来突出切片的多边缘Wasserstein距离的许多属性。特别是,我们表明(i)切片的多边缘Wasserstein距离是一种(概括的)指标,其诱导与标准的Wasserstein距离相同的拓扑,(ii)它承认无维样本复杂度,(iii)是与切片沃斯斯坦度量标准下的双重Centric的问题紧密连接。我们通过说明切片的多边缘Wasserstein对多任务密度估计和多动力增强学习问题的结论。
translated by 谷歌翻译
In the past years, deep learning has seen an increase of usage in the domain of histopathological applications. However, while these approaches have shown great potential, in high-risk environments deep learning models need to be able to judge their own uncertainty and be able to reject inputs when there is a significant chance of misclassification. In this work, we conduct a rigorous evaluation of the most commonly used uncertainty and robustness methods for the classification of Whole-Slide-Images under domain shift using the H\&E stained Camelyon17 breast cancer dataset. Although it is known that histopathological data can be subject to strong domain shift and label noise, to our knowledge this is the first work that compares the most common methods for uncertainty estimation under these aspects. In our experiments, we compare Stochastic Variational Inference, Monte-Carlo Dropout, Deep Ensembles, Test-Time Data Augmentation as well as combinations thereof. We observe that ensembles of methods generally lead to higher accuracies and better calibration and that Test-Time Data Augmentation can be a promising alternative when choosing an appropriate set of augmentations. Across methods, a rejection of the most uncertain tiles leads to a significant increase in classification accuracy on both in-distribution as well as out-of-distribution data. Furthermore, we conduct experiments comparing these methods under varying conditions of label noise. We observe that the border regions of the Camelyon17 dataset are subject to label noise and evaluate the robustness of the included methods against different noise levels. Lastly, we publish our code framework to facilitate further research on uncertainty estimation on histopathological data.
translated by 谷歌翻译
Charisma is considered as one's ability to attract and potentially also influence others. Clearly, there can be considerable interest from an artificial intelligence's (AI) perspective to provide it with such skill. Beyond, a plethora of use cases opens up for computational measurement of human charisma, such as for tutoring humans in the acquisition of charisma, mediating human-to-human conversation, or identifying charismatic individuals in big social data. A number of models exist that base charisma on various dimensions, often following the idea that charisma is given if someone could and would help others. Examples include influence (could help) and affability (would help) in scientific studies or power (could help), presence, and warmth (both would help) as a popular concept. Modelling high levels in these dimensions for humanoid robots or virtual agents, seems accomplishable. Beyond, also automatic measurement appears quite feasible with the recent advances in the related fields of Affective Computing and Social Signal Processing. Here, we, thereforem present a blueprint for building machines that can appear charismatic, but also analyse the charisma of others. To this end, we first provide the psychological perspective including different models of charisma and behavioural cues of it. We then switch to conversational charisma in spoken language as an exemplary modality that is essential for human-human and human-computer conversations. The computational perspective then deals with the recognition and generation of charismatic behaviour by AI. This includes an overview of the state of play in the field and the aforementioned blueprint. We then name exemplary use cases of computational charismatic skills before switching to ethical aspects and concluding this overview and perspective on building charisma-enabled AI.
translated by 谷歌翻译
Deep learning-based 3D human pose estimation performs best when trained on large amounts of labeled data, making combined learning from many datasets an important research direction. One obstacle to this endeavor are the different skeleton formats provided by different datasets, i.e., they do not label the same set of anatomical landmarks. There is little prior research on how to best supervise one model with such discrepant labels. We show that simply using separate output heads for different skeletons results in inconsistent depth estimates and insufficient information sharing across skeletons. As a remedy, we propose a novel affine-combining autoencoder (ACAE) method to perform dimensionality reduction on the number of landmarks. The discovered latent 3D points capture the redundancy among skeletons, enabling enhanced information sharing when used for consistency regularization. Our approach scales to an extreme multi-dataset regime, where we use 28 3D human pose datasets to supervise one model, which outperforms prior work on a range of benchmarks, including the challenging 3D Poses in the Wild (3DPW) dataset. Our code and models are available for research purposes.
translated by 谷歌翻译
This article concerns Bayesian inference using deep linear networks with output dimension one. In the interpolating (zero noise) regime we show that with Gaussian weight priors and MSE negative log-likelihood loss both the predictive posterior and the Bayesian model evidence can be written in closed form in terms of a class of meromorphic special functions called Meijer-G functions. These results are non-asymptotic and hold for any training dataset, network depth, and hidden layer widths, giving exact solutions to Bayesian interpolation using a deep Gaussian process with a Euclidean covariance at each layer. Through novel asymptotic expansions of Meijer-G functions, a rich new picture of the role of depth emerges. Specifically, we find that the posteriors in deep linear networks with data-independent priors are the same as in shallow networks with evidence maximizing data-dependent priors. In this sense, deep linear networks make provably optimal predictions. We also prove that, starting from data-agnostic priors, Bayesian model evidence in wide networks is only maximized at infinite depth. This gives a principled reason to prefer deeper networks (at least in the linear case). Finally, our results show that with data-agnostic priors a novel notion of effective depth given by \[\#\text{hidden layers}\times\frac{\#\text{training data}}{\text{network width}}\] determines the Bayesian posterior in wide linear networks, giving rigorous new scaling laws for generalization error.
translated by 谷歌翻译
In this paper we study the smooth strongly convex minimization problem $\min_{x}\min_y f(x,y)$. The existing optimal first-order methods require $\mathcal{O}(\sqrt{\max\{\kappa_x,\kappa_y\}} \log 1/\epsilon)$ of computations of both $\nabla_x f(x,y)$ and $\nabla_y f(x,y)$, where $\kappa_x$ and $\kappa_y$ are condition numbers with respect to variable blocks $x$ and $y$. We propose a new algorithm that only requires $\mathcal{O}(\sqrt{\kappa_x} \log 1/\epsilon)$ of computations of $\nabla_x f(x,y)$ and $\mathcal{O}(\sqrt{\kappa_y} \log 1/\epsilon)$ computations of $\nabla_y f(x,y)$. In some applications $\kappa_x \gg \kappa_y$, and computation of $\nabla_y f(x,y)$ is significantly cheaper than computation of $\nabla_x f(x,y)$. In this case, our algorithm substantially outperforms the existing state-of-the-art methods.
translated by 谷歌翻译